iNZight, Surveys, and the IDI

Tom Elliott

Te Rourou Tātaritanga
Victoria University of Wellington

tomelliott.co.nz

Updates

PhD thesis

  • submitted 9th April
  • defended 1st August
  • graduation tomorrow!

PhD thesis (TL;DR)

  • predicting buses is hard.
  • real-time traffic data from other buses to predict upcoming ones …
    • point estimates
    • interval estimates
  • \(\mathbb{P}\)(catch bus | I arrive by), \(\mathbb{P}\)(bus arrives before 9am)
  • useful for probabilistic journey planning

If you’re interested … https://tomelliott.co.nz/phd

Postdoc @ VUW @ UoA

  • MBIE Endeavour grant

    • Colin Simpson (VUW), Barry Milne (COMPASS), Andrew Sporle

    • Informatics for Social Services and Wellbeing …

    • more later!

  • Honorary position here (thanks James)

iNZight

iNZight main window

iNZight

  • my side-project since 2013/14

  • shifting focus as audience has evolved

iNZight

Before 2015

  • schools
  • some university

iNZight

2015–2019

  • education (school/university/MOOC)
  • unexpected places
    • data journalism
    • wildlife manager in Canada

iNZight

Recently

  • democratisation

    See Chris Wild’s talks featuring hits like We Will Plot You

  • rapid research development (Andrew Sporle)

    for organisations/groups with low/no money/time/both

iNZight

  • recent focus on surveys — now handled natively!

    • plots
    • summaries (tables of counts)
    • inference / modelling
    • data wrangling …
  • key goal is removal of barriers

Surveys and iNZight

Data

GUI

Explore

Export results/code

What if data is from a survey?

In

iNZight isn’t much better … or is it?!

Specify survey design

(Remember survey variables never have nice names)

mysurvey.zip

  • mysurvey.csv
  • mysurvey.svydesign

Demo

iNZight main window

mysurvey.svydesign

data = "mysurvey.csv"
weights = "wt0"
repweights = "^w[0-4]"
reptype = "JK1"
  • accessible
  • quickly open and explore
  • business as usual
    • plots
    • summaries/inference (population counts)
    • data wrangling

(A few) Details

iNZight’s package collection

9+ iNZight* packages

  • iNZight (GUI interface, collects user input, displays results)

  • iNZightModules (UI for time series, regression, maps, …)

  • iNZightPlots (graphs, summaries, inference)

  • iNZightTools (utility functions, data wrangling)

  • iNZightTS (time series)

  • iNZightMR (multiple response)

  • iNZightRegression (model summaries, residual plots)

  • iNZightMaps (lat/lng points, fill-in-the-shapefile maps)

  • plus vit and some others …

  • wrapper functions makes programming GUIs easier

    • inputs \(\equiv\) arguments
  • packages don’t need GUI

    • iNZightPlots::inzplot()

    • simple functions aimed towards novice coders

  • returns the R code

GUI \(\rightarrow\) high level functions \(\rightarrow\) lower-level (e.g., ggplot)

An example: Filtering data

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          4.9         3.0          1.4         0.2  setosa
## 2          4.7         3.2          1.3         0.2  setosa
## 3          4.6         3.1          1.5         0.2  setosa
## 4          4.6         3.4          1.4         0.3  setosa
## 5          5.0         3.4          1.5         0.2  setosa
## 6          4.4         2.9          1.4         0.2  setosa
## [1] "iris %>% dplyr::filter(Sepal.Width < 3.5)"

A slightly more complex example

## # A tibble: 3 x 3
##   Species    Sepal.Length_median Sepal.Length_var
##   <fct>                    <dbl>            <dbl>
## 1 setosa                     5              0.124
## 2 versicolor                 5.9            0.266
## 3 virginica                  6.5            0.404
## [1] "iris %>% dplyr::group_by(Species) %>% dplyr::summarize(Sepal.Length_median = median(Sepal.Length, "
## [2] "    na.rm = TRUE), Sepal.Length_var = var(Sepal.Length, na.rm = TRUE), "                           
## [3] "    .groups = \"drop\")"

What about surveys?

  • modified wrapper functions to handle surveys

  • refactored GUI to pass around a ‘data-thing’ (data or survey)

## [1] "dclus2 %>% srvyr::as_survey() %>% srvyr::filter(api99 >= 700)"

Big thanks to the ‘srvyr’ package!

Te Rourou Tātaritanga

How does this all relate to my postdoc?

Rourou = basket

Nā tō rourou, nā taku rourou, ka ora ai te iwi.

(With your food basket and my food basket the people will thrive.)

Tātaritanga = analysis


Te Rourou Tātaritanga

“Tools for analytics and sharing data for the betterment of communities.”


Or: “Informatics for Social Services and Wellbeing”

Primary goals

  1. Improve data standards

  2. Promote Māori data sovereignty

  3. Develop systems to support access

  4. Evaluate synthesising of datasets

  5. Security and privacy implications

  6. Machine learning and AI methods

https://terourou.org

Primary goals

  1. Improve data standards

  2. Promote Māori data sovereignty

  3. Develop systems to support access

  4. Evaluate synthesising of datasets

  5. Security and privacy implications

  6. Machine learning and AI methods

https://terourou.org

The Integrated Data Infrastructure (IDI)

  • database connecting data across NZs sectors

  • high security environment

  • but also other unnecessary barriers: coding!

iNZight to the rescue!

  • many upcoming researchers will have used iNZight at high school or university

  • no need to learn to code, OR remember how to do things you haven’t done in 2 years

  • currently working on deploying a demo of iNZight in the Stats NZ data lab — watch this space!

    • intial goal: confine to small datasets
    • primary researcher can prepare using SQL to select/join data
    • other researchers (without great coding skills) can easily explore the data — graphs, tables, models!
    • offers a restricted set of methods which can help prevent novices from running really-big-queries and causing havoc on the servers
    • and build from there!

Outside the data lab

  • lots of data outside the datalab

  • many iwi groups, pacific nations, etc. have specific needs for simple (to complex) population summaries/demographic outputs

  • iNZight means they can do it every 1–2 years without needing to train/retrain/pay expensive statisticians

  • iNZight also produces code: generate script to re-run/edit as necessary (without having to do all the hard stuff first)

Bayesian demography

  • why limit yourself to tables when you can fit hierarchical Bayesian models with model-specific priors, likelihoods, … ?

  • John Bryant has a set of R packages (dembase, demest, …) for doing Bayesian demography

  • using them is a bit of a challenge (especially if you don’t do much R coding!)

  • so we tested out iNZight’s new add-on system …

DEMO

Other projects

Both work and ‘fun’

IDI Search App

  • to get access to the IDI, you need to put together a research proposal

  • putting together a research proposal requires knowing what data is available to investigate

  • that data is hidden away in the IDI

IDI Search app

  • we put together a simple web app providing a searchable database so prospective (and current) IDI researchers can explore what’s available

  • build using ReactJS

DEMO

https://idi-search.web.app/

Bus display v2

  • the display in 302 was broken

  • so I rebuilt it again, this time using ReactJS + d3

  • simpler than the last version (no ‘history’ as it just uses real-time data, no backing server)

DEMO

https://tomelliott.co.nz/bus-display/

Lots of ReactJS …

  • it’s my goal to, one day, put together a prototype of a new version of iNZight using ReactJS and R-serve

  • one version that runs on Windows / macOS / Linux / web

  • plus capability of having a local R server, remote R server - firewall, etc.

NO DEMO

Thank you

Github: tmelliott | iNZightVIT | terourou

Twitter: @tomelliottnz | @iNZightUoA | @terourou

tomelliott.co.nz | inzight.nz | terourou.org

References